Overview
Data Structures
|
|
Warnings
| warnings | status | recommand |
|---|---|---|
| company_type has 6,145 (32%) missing values | missing | judgement |
| company_size has 5,945 (31%) missing values | missing | judgement |
| gender has 4,519 (23.5%) missing values | missing | judgement |
| major_discipline has 2,815 (14.7%) missing values | missing | judgement |
| education_level has 461 (2.4%) missing values | missing | judgement |
| last_new_job has 424 (2.2%) missing values | missing | judgement |
| enrolled_university has 387 (2%) missing values | missing | judgement |
| experience has 65 (0.3%) missing values | missing | judgement |
| base_date has 23 (0.1%) missing values | missing | judgement |
| ids has high(1.00) cardinality, Maybe identifier | cardinality | check |
| test has constant value “0” | cardinality | remove |
| base_date3 has constant value “2021-06-12 09:00:00” | cardinality | remove |
| test has 19,189 (100%) zeros | zero | check |
| city_dev_index has 13 (0.07%) zeros | zero | check |
| city_dev_index has 6 (0.03%) negatives | negative | check |
| training_hours has 986 (5.14%) outliers | outlier | judgement |
| city_dev_index has 36 (0.19%) outliers | outlier | judgement |
Variables
| variables | types | missing | cardinality | zero | minus | outlier |
|---|---|---|---|---|---|---|
| enrollee_id | character | > high | ||||
| city | factor | |||||
| city_dev_index | numeric | X | X | X | ||
| gender | factor | X | ||||
| relevent_experience | factor | |||||
| enrolled_university | factor | X | ||||
| education_level | ordered | X | ||||
| major_discipline | factor | X | ||||
| experience | ordered | X | ||||
| company_size | ordered | X | ||||
| company_type | factor | X | ||||
| last_new_job | ordered | X | ||||
| training_hours | integer | X | ||||
| job_chnge | factor | |||||
| test | numeric | constant | X | |||
| ids | character | identifier | ||||
| base_date | Date | X | ||||
| base_date2 | POSIXct | |||||
| base_date3 | POSIXct | constant |
Missing Values
List of Missing Values
| variables | missing_count | missing (%) | status | recommand |
|---|---|---|---|---|
| company_type | 6,145 | 32% | Bad | Model based Imputation |
| company_size | 5,945 | 31% | Bad | Model based Imputation |
| gender | 4,519 | 23.5% | Bad | Model based Imputation |
| major_discipline | 2,815 | 14.7% | NotBad | Model based Imputation |
| education_level | 461 | 2.4% | Good | Delete or Imputation |
| last_new_job | 424 | 2.2% | Good | Delete or Imputation |
| enrolled_university | 387 | 2% | Good | Delete or Imputation |
| experience | 65 | 0.3% | Good | Delete or Imputation |
| base_date | 23 | 0.1% | Good | Delete or Imputation |
Visualization
Unique Values
Categorical Vaiables
Variables where the proportion of unique data is more than 0.5 or unique is 1.
| variables | types | unique | unique (%) | status | recommand |
|---|---|---|---|---|---|
| enrollee_id | character | 19,158 | 99.8% | high cardinality | Judgment |
| ids | character | 19,189 | 100% | identifier | Use as ID |
| base_date3 | POSIXct | 1 | 0% | constant | Remove Variable |
Numerical Variables
Variables where the unique cases is less than 5 or unique is 1.
| variables | types | unique | unique (%) | status | recommand |
|---|---|---|---|---|---|
| test | numeric | 1 | 0% | constant | Remove Variable |
Categorical Variable Diagnosis
Top Ranks
| variables | levels | freq | ratio (%) |
|---|---|---|---|
| base_date | 2021-06-26 | 700 | 3.6 |
| base_date | 2021-07-10 | 688 | 3.6 |
| base_date | 2021-06-19 | 684 | 3.6 |
| base_date | 2021-07-03 | 671 | 3.5 |
| base_date | 2021-06-15 | 662 | 3.4 |
| base_date | 2021-07-09 | 662 | 3.4 |
| base_date | 2021-06-13 | 661 | 3.4 |
| base_date | 2021-06-21 | 655 | 3.4 |
| base_date | 2021-07-12 | 654 | 3.4 |
| base_date | 2021-06-16 | 645 | 3.4 |
| base_date | Other levles | 12,484 | 65.1 |
| base_date | Missing | 23 | 0.1 |
| base_date2 | 2021-06-12 09:00:20 | 696 | 3.6 |
| base_date2 | 2021-06-12 09:00:06 | 693 | 3.6 |
| base_date2 | 2021-06-12 09:00:12 | 692 | 3.6 |
| base_date2 | 2021-06-12 09:00:23 | 672 | 3.5 |
| base_date2 | 2021-06-12 09:00:19 | 664 | 3.5 |
| base_date2 | 2021-06-12 09:00:25 | 664 | 3.5 |
| base_date2 | 2021-06-12 09:00:14 | 660 | 3.4 |
| base_date2 | 2021-06-12 09:00:24 | 660 | 3.4 |
| base_date2 | 2021-06-12 09:00:28 | 656 | 3.4 |
| base_date2 | 2021-06-12 09:00:03 | 646 | 3.4 |
| base_date2 | Other levles | 12,486 | 65.1 |
| base_date3 | 2021-06-12 09:00:00 | 19,189 | 100.0 |
| city | city_103 | 4,361 | 22.7 |
| variables | levels | freq | ratio (%) |
|---|---|---|---|
| city | city_21 | 2,710 | 14.1 |
| city | city_16 | 1,535 | 8.0 |
| city | city_114 | 1,338 | 7.0 |
| city | city_160 | 848 | 4.4 |
| city | city_136 | 586 | 3.1 |
| city | city_67 | 431 | 2.2 |
| city | city_102 | 305 | 1.6 |
| city | city_75 | 305 | 1.6 |
| city | city_104 | 301 | 1.6 |
| city | Other levles | 6,469 | 33.7 |
| company_size | 50-99 | 3,090 | 16.1 |
| company_size | 100-499 | 2,578 | 13.4 |
| company_size | 10000+ | 2,022 | 10.5 |
| company_size | 10-49 | 1,474 | 7.7 |
| company_size | 1000-4999 | 1,331 | 6.9 |
| company_size | <10 | 1,308 | 6.8 |
| company_size | 500-999 | 878 | 4.6 |
| company_size | 5000-9999 | 563 | 2.9 |
| company_size | Missing | 5,945 | 31.0 |
| company_type | Pvt Ltd | 9,838 | 51.3 |
| company_type | Funded Startup | 1,002 | 5.2 |
| company_type | Public Sector | 957 | 5.0 |
| company_type | Early Stage Startup | 605 | 3.2 |
| company_type | NGO | 521 | 2.7 |
| company_type | Other | 121 | 0.6 |
| company_type | Missing | 6,145 | 32.0 |
| education_level | Graduate | 11,616 | 60.5 |
| education_level | Masters | 4,371 | 22.8 |
| education_level | High School | 2,017 | 10.5 |
| variables | levels | freq | ratio (%) |
|---|---|---|---|
| education_level | Phd | 415 | 2.2 |
| education_level | Primary School | 309 | 1.6 |
| education_level | Missing | 461 | 2.4 |
| enrolled_university | no_enrollment | 13,839 | 72.1 |
| enrolled_university | Full time course | 3,763 | 19.6 |
| enrolled_university | Part time course | 1,200 | 6.3 |
| enrolled_university | Missing | 387 | 2.0 |
| enrollee_id | 16814 | 2 | 0.0 |
| enrollee_id | 18272 | 2 | 0.0 |
| enrollee_id | 19249 | 2 | 0.0 |
| enrollee_id | 20866 | 2 | 0.0 |
| enrollee_id | 20881 | 2 | 0.0 |
| enrollee_id | 21563 | 2 | 0.0 |
| enrollee_id | 21634 | 2 | 0.0 |
| enrollee_id | 22899 | 2 | 0.0 |
| enrollee_id | 23825 | 2 | 0.0 |
| enrollee_id | 24936 | 2 | 0.0 |
| enrollee_id | Other levles | 19,169 | 99.9 |
| experience | >20 | 3,293 | 17.2 |
| experience | 5 | 1,434 | 7.5 |
| experience | 4 | 1,405 | 7.3 |
| experience | 3 | 1,359 | 7.1 |
| experience | 6 | 1,218 | 6.3 |
| experience | 2 | 1,132 | 5.9 |
| experience | 7 | 1,029 | 5.4 |
| experience | 10 | 986 | 5.1 |
| experience | 9 | 982 | 5.1 |
| experience | 8 | 802 | 4.2 |
| experience | Other levles | 5,484 | 28.6 |
| variables | levels | freq | ratio (%) |
|---|---|---|---|
| experience | Missing | 65 | 0.3 |
| gender | Male | 13,241 | 69.0 |
| gender | Female | 1,238 | 6.5 |
| gender | Other | 191 | 1.0 |
| gender | Missing | 4,519 | 23.5 |
| ids | ID1 | 1 | 0.0 |
| ids | ID10 | 1 | 0.0 |
| ids | ID100 | 1 | 0.0 |
| ids | ID1000 | 1 | 0.0 |
| ids | ID10000 | 1 | 0.0 |
| ids | ID10001 | 1 | 0.0 |
| ids | ID10002 | 1 | 0.0 |
| ids | ID10003 | 1 | 0.0 |
| ids | ID10004 | 1 | 0.0 |
| ids | ID10005 | 1 | 0.0 |
| ids | Other levles | 19,179 | 99.9 |
| job_chnge | No | 14,406 | 75.1 |
| job_chnge | Yes | 4,783 | 24.9 |
| last_new_job | 1 | 8,054 | 42.0 |
| last_new_job | >4 | 3,296 | 17.2 |
| last_new_job | 2 | 2,903 | 15.1 |
| last_new_job | never | 2,458 | 12.8 |
| last_new_job | 4 | 1,030 | 5.4 |
| last_new_job | 3 | 1,024 | 5.3 |
| last_new_job | Missing | 424 | 2.2 |
| major_discipline | STEM | 14,518 | 75.7 |
| major_discipline | Humanities | 670 | 3.5 |
| major_discipline | Other | 382 | 2.0 |
| major_discipline | Business Degree | 328 | 1.7 |
| variables | levels | freq | ratio (%) |
|---|---|---|---|
| major_discipline | Arts | 253 | 1.3 |
| major_discipline | No Major | 223 | 1.2 |
| major_discipline | Missing | 2,815 | 14.7 |
| relevent_experience | Has relevent experience | 13,814 | 72.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
| relevent_experience | No relevent experience | 5,375 | 28.0 |
Numerical Variable Diagnosis
Distributions
| variables | min | Q1 | mean | median | Q3 | max | zero | minus | outlier |
|---|---|---|---|---|---|---|---|---|---|
| city_dev_index | -0.5 | 0.74 | 0.83 | 0.9 | 0.92 | 0.95 | 13 | 6 | 36 |
| training_hours | 1.0 | 23.00 | 65.36 | 47.0 | 88.00 | 336.00 | 0 | 0 | 986 |
| test | 0.0 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 19,189 | 0 | 0 |
Zero Values
| variables | min | median | max | zero | zero (%) |
|---|---|---|---|---|---|
| test | 0.0 | 0.0 | 0.00 | 19,189 | 100.0 |
| city_dev_index | -0.5 | 0.9 | 0.95 | 13 | 0.1 |
Minus Values
| variables | min | median | max | minus | minus (%) |
|---|---|---|---|---|---|
| city_dev_index | -0.5 | 0.9 | 0.95 | 6 | 0 |
Outliers
List of Outliers
| variables | min | median | max | outlier | outlier (%) |
|---|---|---|---|---|---|
| training_hours | 1.0 | 47.0 | 336.00 | 986 | 5.1 |
| city_dev_index | -0.5 | 0.9 | 0.95 | 36 | 0.2 |
Individual Outliers
variable: training_hours
| Measures | Values |
|---|---|
| Outliers count | 986 |
| Outliers ratio (%) | 5.14% |
| Mean of outliers | 247.5335 |
| Mean with outliers | 65.35984 |
| Mean without outliers | 55.49206 |
variable: city_dev_index
| Measures | Values |
|---|---|
| Outliers count | 36 |
| Outliers ratio (%) | 0.19% |
| Mean of outliers | 0.1282222 |
| Mean with outliers | 0.8278187 |
| Mean without outliers | 0.8291337 |